Picture for Hanxin Zhu

Hanxin Zhu

Future Forcing: Future-aware Training-free KV Cache Policy for Autoregressive Video Generation

Add code
May 28, 2026
Viaarxiv icon

GTA: Advancing Image-to-3D World Generation via Geometry Then Appearance Video Diffusion

Add code
May 13, 2026
Viaarxiv icon

Embody4D: A Generalist 4D World Model for Embodied AI

Add code
May 03, 2026
Viaarxiv icon

Training-Free Sparse Attention for Fast Video Generation via Offline Layer-Wise Sparsity Profiling and Online Bidirectional Co-Clustering

Add code
Mar 19, 2026
Viaarxiv icon

PhysVideo: Physically Plausible Video Generation with Cross-View Geometry Guidance

Add code
Mar 19, 2026
Viaarxiv icon

VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model

Add code
Feb 14, 2026
Viaarxiv icon

AR4D: Autoregressive 4D Generation from Monocular Videos

Add code
Jan 03, 2025
Figure 1 for AR4D: Autoregressive 4D Generation from Monocular Videos
Figure 2 for AR4D: Autoregressive 4D Generation from Monocular Videos
Figure 3 for AR4D: Autoregressive 4D Generation from Monocular Videos
Figure 4 for AR4D: Autoregressive 4D Generation from Monocular Videos
Viaarxiv icon

GSemSplat: Generalizable Semantic 3D Gaussian Splatting from Uncalibrated Image Pairs

Add code
Dec 22, 2024
Figure 1 for GSemSplat: Generalizable Semantic 3D Gaussian Splatting from Uncalibrated Image Pairs
Figure 2 for GSemSplat: Generalizable Semantic 3D Gaussian Splatting from Uncalibrated Image Pairs
Figure 3 for GSemSplat: Generalizable Semantic 3D Gaussian Splatting from Uncalibrated Image Pairs
Figure 4 for GSemSplat: Generalizable Semantic 3D Gaussian Splatting from Uncalibrated Image Pairs
Viaarxiv icon

TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation

Add code
Dec 13, 2024
Figure 1 for TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation
Figure 2 for TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation
Figure 3 for TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation
Figure 4 for TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation
Viaarxiv icon

Compositional 3D-aware Video Generation with LLM Director

Add code
Aug 31, 2024
Figure 1 for Compositional 3D-aware Video Generation with LLM Director
Figure 2 for Compositional 3D-aware Video Generation with LLM Director
Figure 3 for Compositional 3D-aware Video Generation with LLM Director
Figure 4 for Compositional 3D-aware Video Generation with LLM Director
Viaarxiv icon